Punc'data is a tool to display and visualize complex mixtures data by high resolution mass spectrometry
First time ? You can download a test file at
https://github.com/WTVoe/puncdata called tesdata_cellulose.csv
1) Upload your file on tab "upload". Make sure the number of column is correct. If not, the wrong separator might be selected in "upload parameters" (gear icon)
2) You may go to tab "table" to check if a file has been imported correctly
3) If you are using Punc'data for the first time with this data table, you need to check in the parameters tab if all the columns (intensity, m/z ratio...) are correct
4) From there, you can go to any of the other operations presented down below
5) For quick charts, go to canvas A, press the right button "Premade Canvas" and press "Validate"
6) If you have already created parameters, you can import them from canvas & parameters tabs.
1) Select the file you want calibrated on the lefthand menu of the "calibration" tab, under "File to calibrate"
2) If you already worked with Punc'data, you can import your calibration list and parameters using the right button "import config"
3) Edit the calibrants list by pressing the button "Edit Calibration list". Alternatively, you can press "Paste" and paste it from a spreadsheet (list of formulas, masses or both)
4) You may also add any calibrant with a repeat unit within a definable mass range under the menu "Edit Calibration list">"Add a polymer/..."
5) Once you configured your parameters, press "Compute". The status should appear in the text box below accompanied by charts
6) Once you are satisfied with your calibration, press "Save Calibrated data". You will be prompted either to override your previous dataset or to create a new one.
Main procedure
1)On tab "Attribution", choose the file you want to assign on the leftmost menu
2)On the leftmost menu, choose an attribution preset. Alternatively, you may check/uncheck steps in the procedure on the middle menu
3)Configure step by step each sub-menu in the middle by clicking on the titles : "isotopy","seeds"...
4)If you already worked with Punc'data, you can import your previous parameters with the rightmost button "import config"
5)Once configured, you can press "Compute". You will be informed step by step of the computation. Once finished, charts will appear to display your results
6)If you are satisfied, you can press "Validate results" to either replace your dataset or create a new one
Configurations
Isotopy
The isotopy parameters will tag isotopic peaks before attribution. Intensity of the isotopic distribution will be remembered by the algorithm and can be saved at the end of the attribution process
- To find the right tagging tolerance for your dataset, press "Visualize mass delta" on the leftmost menu, and press "draw" for 13C
- With well-calibrated data, the distribution of mass differences should look like a bell curve. You can modify the half-window tolerance (orange bars) to find the correct width for your data and adapt your tolerance accordingly once you leave this menu
You can add more isotopic distributions that need to be tagged under the menu "Choose elements">"Advanced edition"
Seeds
Seeds are the first formulae searched in your dataset. You can name them (optional) and give their Ionic formula. Don't forget the charge (write + or - at the end)
You can also paste your seeds formulae from a spreadsheet (one line for each formula. If there is one column, it will be considered as the formula, if there are two the first one will be the name and the second the formula)
Supervized Network
This method, based on mass differences, is useful for NOM (Natural Organic Matter), DOM (Dissolved Organic Matter),bio-oils, petroleum, polymers...
- Customize the mass differences that will be searched under the menu "Customize directed network units". If you are not sure what to use yet, try first the Unsupervized network (advised only for less than 8000 peaks)
- To choose the tagging tolerance that suits your dataset, press "Visualize mass delta" on the leftmost menu, select one of your mass differences and press "draw"
- With well-calibrated data, the distribution of mass differences should look like a bell curve. You can modify the half-window tolerance (orange bars) to find the correct width for your data and adapt your tolerance accordingly once you leave this menu
- The network exploration tactic doesn't affect your results if you do not have errors in the network. Change this parameter only if you want to verify that the assignemnt process is path-independent
- The "tolerance of attribution" is the final tolerance for your assignments. It is advised to put a value slightly higher than the one used under the "passes" menu
Unsupervized Network
The unsupervized network is a method for automatic lookup of the most abundant mass differences. The default parameters are well-suited for the mass accuracy of (calibrated) data acquired using an FT-ICR MS.
- The bounds indicate the mass range within which mass differences are searched
- The option "Keep" indicates how many of the most common mass differences should be kept to construct a network
- The two button "custom attrib DB" and "custom attrib pass" are used to assign to a neutral chemical formula the mass differences that were kept
- "Delta attribution tolerance" is the tolerance for assigning these mass differences
- You can check which mass differences were found and assigned after pressing "compute" and looking under the middle menu "Log results">"Log deltas summary"
Passes
Passes are lists of elements and their bounds. They define a list of possible chemical/ionic formulae whose mass are matched against the m/z values in your dataset
Start by defining the charge (+1/-1) of expected ions. Punc'data doesn't handle yet multicharged ions
"Ion type" refers to radical/non radical ions. Define this parameter accordingly with you ionization source and sample
You can then define one or multiple passes. The algorithm will try to assign every peak with your first pass, and then will move on only with the remaining peaks with the second pass
You can give a name to every pass and define their max m/z value. If you do not put any max m/z value, it will be automatically defined as the largest m/z value in your dataset
Potential metals/halogen adducts (Na+, Cl-...) should be added as elements in the passes
Avoid putting too many elements in a single pass, as this will increase drastically the computation time
You can copy/pastes passes from a spreadsheet. One line for each element, one column for element,min and max
By pressing "Override default filters" you will be able to change the filters only for the currently edited pass. You need to click the top checkbox on the popup menu to activate it
Filtering
Filters limit the possible assigned formulae. The default values are inspired by the work of Kind and Fiehn (DOI: 10.1186/1471-2105-8-105)
"KMD bounds" is an additional parameter to restrict assignements to only a certain range of KMD values. The KMD unit can be defined between the parenthesis.
Other considerations
- Under "Posttreatment" there are options to try and find which element in the ionic formula is the adduct
- Under "Posttreatement" You can modify the configuration of the output results table : you can change the column order to best suit your needs
- "Log results" can help you monitor how your dataset was segmented and the network results
- "Log results" can also show you the output table without validating the assignements (press "log attributions")
- The right menu helps you customize the pie chart and the three other charts displayed. Their type can be changed.
Treat data
1) If you want to treat data (delete peak, parse formulas, add special columns) go to the treatment tab
2) In the treatment tab, the selection menu helps you select an operation, and the bottom part selects which file will the operation be applied on.
3) The operation "Parse a formula" will parse the column "formula" from your dataset in separate columns for each elements. It can also compute ratios such as O/C (customizable).
Matrix and data comparison
1)To compare samples, you can either directly go to the venn tab and compare them, or make a Matrix and then compare them by PCA (skip to 5) ).
2) On the Venn tab, you can customize appearance and files selected. At most, you can compare 4 samples simultaneously. You can compare them based on their formula or based on their m/z ratio.
3) On the Venn tab, hovering over the text inside of the circles will display additional data. By clicking on one you will be abled to export the data to a spreadsheet.
4) Each intersection will be available as an independant file on the canvas tabs.
4) To make a Matrix, select a list of files (they must have the same column order) and setup every parameter, then press "compute"
5) Once a Matrix is computed, it can be exported or is directly available in most other menus for treatment or display under the name "Matrix"
6) On the "PCA" tab you can select a matrix and press compute. A matrix is either: a Matrix just computed by Punc'data, or a file defined as a Matrix in "upload" tab
Warning
•When comparing files, check that the columns are the same for every sample and that they match with the values in the parameters tab
•When comparing files, also check that all files have the same total number of columns, even if they are not used
•When comparing multile files at the same time, beware of the current interactive selection: you can select only one dataset at at time, or all simultaneously
•When using the matrix/venn tool with a comparaison based on the formula, pay attention when using the formula comparison mode: if you use the molecule formula (and not the ion formula) many ions can have the same formula and be wrongly merged together. The algorithm don't handle the case where multiple lines refer to the same molecular formula
Canvas, data vizualisation
1) The canvas tabs will help produce different interactive representations of data.All parameters can be edited directly from this tab or from the "parameters" tab
2) To start easily with the canvas tab, load a Premade canvas (gear button).
3) Left menu handles general parameters. Middle menu handles charts parameters. Right menu handles data parameters and color scales
4) A canvas is a sum of 6 chart spaces called "cells" which are interactive.
5) Many types of charts are available:to make Van krevelen or DBE/#C plots, use the "scatter plot" option
6) For kendrick maps, you have the option to choose the repeating unit in a list, or write its formula, or enter its mass.
7) When you make a selection on a chart it will highlight the same points on all the other charts from the same page
8) Selections on histograms will filter data and only display peaks that are included in the selection
9) You can copy selections or delete highlighted points by pressing either ctrl+c or del.
10) When you are satisfied with your charts, you can export a screenshot (png or svg format)
11) You may also save your chart parameters or save everything including your data.
12) To reopen this data on a future session, use the large button at the top of "upload" tab, "load a punc'data session"
Types of Cells/Charts
Scatter plot: Represent each line of a table as a dot, with possible link between intensity and area of the dot. Useful for Van krevelen diagrams and (#C)/DBE
Mass spectra: Represents each line of a table as a thin line with (x;y)→(m/z;intensity)
Kendrick maps: appears similar to a scatter plot but with calculation of kendrick mass defects for the y axis (and m/z for the x axis)
Kendrick 2D: a Kendrick 2D maps, with two customizable mass defects. Alignements are horizontal and overlay of dots
Contour map: Acts similar to a scatter plot but represent density of the points with a contour map. The bandwith controls the smoothing and the thresholds the number of "steps" for the density
Table of data: Computes averages for each column of your table
Histogram : Classifies all lines of a table based on a column criteria. The y axis can be calculated on a % of total lines, a % of total intensity or an absolute number of lines
Histogram discrete: Similar to histogram but can be valid for textual variables and sorting the bars in order of occurences
Histogram of Classes: Acts similar to the "Classes" Tab. Represents an histogram of the different classes present in the sample based on the classes you've defined
Density maps: Acts as a 2D histogram. The height of each bar is represented by a color. It gives similar representations to the contour map
PCA variables : Shows data for a treated PCA matrix with column names containing "component". Similar to scatter plots in its behaviour
Networks
1) Networks function with a physics simulation based on d3.js library. It is recommended to try at most data files with at most 4000 peaks to avoid long computations
2) Main parameters are displayed on the left menu. On the right menu you can modify nodes or edges rules.
Shortcuts
•Clicking on a chart will highlight points (increase their intensity and add an outline) on all the charts simultaneously. If you make a selection on an histogram it will hide all points outside of this selection.
•Shift+click zooms on the chart. A double click will reset the zoom.
•After making a selection, pressing on DEL will, on the user's choice, delete the selection or crop the chart.
•ctrl+click when hovering data will pinpoint the tooltip
•You can change the data displayed in tooltips in the "parameters" tab, right column
•ctrl+c when you made a selection will copy the selected data. A popup will appear to confirm it
•By pinpointing a tooltip on an histogram or density map, a button appears to copy the data that makes up this bar
Tips:
•The data are drawn in a reverse order: the first line of your matrix will be drawn on top of the other one (and maybe hide them). On Canvas A,B and Stat you can change this order by ordering data on the dataset menu (middle-right)
•Each file is drawn on top of the previous one, so file 2 will be drawn before file 3 on a same chart, thus it will appear 'under'
•The treatment tab will display the last deleted points, even if you deleted points from a selection.
•If you want to make titles and graduations disappear, go to the parameters tab and put "0" in the font size section.
•You can move label titles in the PCA tab by mainting ctrl pressed when hovering over the labels
•When you change the vizualised file, you can then scroll quickly between files by using the up and down arrow keys ▲ ▼
Punc'data has been developed in a french public laboratory to serve research purposes concerning complex samples.
Punc'data is free software: you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation version 3.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
The GNU Gneral Public License is given along with the source code. You can also find it at
https://www.gnu.org/licenses
Punc'data V.1.15.7
Web Version